7  Evaluating qPCR assays

7.1 Environmental DNA sampling and processing

Consider this

Give a brief overview of the steps required to detect brook trout using environmental DNA.

Here are the key steps:

  • Collect water samples.
  • Filter water samples.
  • Extract DNA from water samples.
  • Run a quantitative PCR assay using a fluorescent-tagged PCR probe.

Now that we have run our samples we need to convert the raw data collected from the qPCR machine to determine in which samples brook trout eDNA was detected.

7.2 Data Acquisition

# read libraries
library(readr)
library(janitor)
library(lubridate)
library(tidyr)
library(dplyr)
library(glue)
library(ggplot2)
library(patchwork)
library(knitr)

Once the run is complete, export the generated data as an excel file. It will consist of 5 spreadsheets, export each of these as it’s own tab-delimited file and place in the data folder of your `Rproject. You can download these files from our shared google folder here.

Let’s read in our files

Sample Setup: This spreadsheet describes how you set up the plate and indicates which wells contain which standards, unknown samples, negative controls, or are empty (TASK column). It also indicates the Reporter and Quencher used for the probe and for the standards the known Quantity.

There are three types of samples

  • Standards
  • Unknowns
  • no template controls (NTC)1
  • 1 We used nuclease free water.

  • This the only data sheet that links well numbers to sample names for unknowns, concentrations for the standards and links the “task” to each well.

    setup <- read_delim("data/qPCR_brk_2024-11-08_samples.csv", delim = ",", skip = 7) %>%
      clean_names() %>%
      select(well, sample_name, target_name, task, quantity) %>%
      filter(!is.na(task))

    Raw Data: This spreadsheet contains the amount of fluorescent detected for each color filter (blue, green, yellow, red) for each well and cycle. This is your most raw data that is used to identify the reporter (generally FAM). We won’t use this for any calculations, instead we will use the file called Multicomponent Data: This is the amount of fluorescence detected for each reporter based on Raw Data for each cycle and well.

    FAM <- read_delim("data/qPCR_brk_2024-11-08_multicomponent.csv", skip = 7, delim = ",") %>%
      clean_names() %>%
      filter(!is.na(fam)) %>%
      left_join(setup)

    Finally, Amplification Data: This data sheet reports the normalized dye fluorescence for each cycle and well. ΔRn is the magnitude of normalized fluorescence signal generated by the reporter for each cycle. Rn is the fluorescence signal normalized to the fluorescence signal for the passive reference.

    Rn <- read_delim("data/qPCR_brk_2024-11-08_amplification.csv", delim = ",", skip = 7) %>%
      clean_names() %>%
      rename(delta_rn = rn_2) %>%
      filter(!is.na(target_name)) %>%
        select(-target_name) %>%
      left_join(setup)

    Results: This spreadsheet contains the Ct results of the run. Ct2 is the PCR cycle number at which the fluorescence meets the threshold of the amplification plot.

  • 2 C = Cycle, t = threshold

  • # extract threshold of detection
    threshold <- read_delim("data/qPCR_brk_2024-11-08_results.csv", skip = 7, delim = ",") %>%
      clean_names() %>%
      select(ct_threshold) %>%
      slice_max(ct_threshold, n = 1, with_ties = FALSE) %>%
      pull()
    
    # extract baseline cycle values
    baseline <- read_delim("data/qPCR_brk_2024-11-08_results.csv", skip = 7, delim = ",") %>%
      clean_names() %>%
      filter(!is.na(target_name)) %>%
      select(well, baseline_start, baseline_end) %>%
      left_join(setup)
    
    # Cycle threshold values
    Ct <- read_delim("data/qPCR_brk_2024-11-08_results.csv", skip = 7, delim = ",") %>%  
      clean_names() %>%                                             
      select(well, c) %>% 
      mutate(c = as.numeric(c)) %>%
      filter(!is.na(c)) %>%           
      left_join(setup)  

    7.3 Data Processing

    During a qPCR the accumulated target DNA is detected as fluorescence as the probe binds to the specific targeted region as the reaction progresses from cycle to cycle. The qPCR measures fluorescence and processes the data into several inputs to quantify the amount of the target DNA in each sample.

    • The fluorescence is measured as R (raw fluorescence) or Rn (normalized fluorescence) which is determined by dividing the fluorescence of the reporter dye and the passive reference dye.
    • The cycle threshold Ct is the intersection between the amplification curve and the threshold line as a relative measure of the concentration of the target in the reaction.
    • The baseline is the level of signal during the first approx. 5-15 cycles. In this stage there is little change in fluorescence. This signal is used to establish the detected level of background signal (fluorescent noise), this is due to e.g. light leaking into the sample well or the plastic ware. The background noise should be very low compared to the amplified signal of the target sequence. The qPCR machine software determines the baseline as a constant/linear component and removes it from the results.
    • The threshold is the determined as the point at which there is a significant increase of fluorescence above the baseline. It should be set sufficiently high above the baseline to be confident about the amplification level. Setting it too high (in the plateau region of the curve) will lead to inaccuracies.
    Give it a try

    Sketch out on a piece of paper how you expect the fluorescence to change for our three standards with different concentrations, for our negative control, and what a sample should look like if brook trout eDNA is present and if none is detected.

    Discuss with your group whether you think we can determine if brook trout are absent in any of our samples.

    Let’s take a look at the change in fluorescence for all of our wells to make sure that we have a stable increase in the signal and no abrupt changes or dips. The reference dye (rox) should not be affected by the target sequence being amplified while the reporter dye from the probe should show exponential increase.

    FAM %>%
      pivot_longer(names_to = "dye", values_to = "fluorescence", 3:4) %>%
      ggplot(aes(x = cycle, y = fluorescence, color = well)) +
        geom_line() +
        facet_grid(target_name ~ dye) +
        scale_y_log10() +
        theme_classic() +
        theme(legend.position = "blank")

    Figure 7.1: Comparison of change in raw fluorescence for reporter dye (fam) and reference dye (rox) across qPCR cycles.

    This raw data is use to first determine ΔRn which is the magnitude of normalized fluorescence signal generated by the reporter integrated into the probe for each cycle. It is the normalized to the fluorescence signal for the passive reference as Rn.

    The StepOne software that runs the qPCR machine automatically calculates baseline (start and end cycle) and threshold levels assuming that the data exhibits a typical amplification plot with four distinct sections

    1. Plateau phase
    2. Linear phase as plateau is approached
    3. Exponential (geometric) phase characterize by sudden steep increase in fluorescence
    4. Baseline (early cycles)

    The end cycle of the baseline should be set a few cycles before the cycle number where significant fluorescent signal is detected, i.e. before the actual amplification curve begins. We pulled information for baseline cycles and threshold level from the results file when we read it in earlier. This means that we can plot our amplification data. And compare those relationships.

    p1 <- ggplot(Rn, aes(x = cycle, y = delta_rn, color = well)) +
      geom_line() +
      geom_hline(yintercept = threshold, linetype = "dotted", color = "red") + 
      geom_vline(xintercept = c(min(baseline$baseline_end)), linetype = "dotted", color = "blue") +
      geom_vline(xintercept = c(max(baseline$baseline_end)), linetype = "dotted", color = "blue") +
      facet_grid(. ~ target_name) +
      theme_classic() +
      theme(legend.position = "none")
    
    p2 <- ggplot(Rn, aes(x = cycle, y = delta_rn, color = well)) +
      geom_line() +
      geom_hline(yintercept = threshold, linetype = "dotted", color = "red") + 
      geom_vline(xintercept = c(min(baseline$baseline_end)), linetype = "dotted", color = "blue") +
      geom_vline(xintercept = c(max(baseline$baseline_end)), linetype = "dotted", color = "blue") +
      facet_grid(. ~ target_name) +
      scale_y_log10() +
      theme_classic() +
      theme(legend.position = "none")
    
    p1 / p2

    Figure 7.2: Comparison of levels of normalized fluorescence (Rn) across all qPCR cycles. The red dotted line indicates the threshold and the blue dotted lines indicate the minimum and maximum start and end point for cycles used to determine the baseline (level of background noise).

    The threshold is set in the exponential phase of the amplification curve, if it is set too high or too low it will increase the standard deviation of the replicate groups.

    We also want to make sure that our negative controls show little to no signal.

    Rn %>%
      filter(task == "NTC") %>%
      ggplot(aes(x = cycle, y = delta_rn, color = well)) +
      geom_line() +
      geom_hline(yintercept = threshold, linetype = "dotted", color = "red") + 
      geom_vline(xintercept = c(min(baseline$baseline_end)), linetype = "dotted", color = "blue") +
      geom_vline(xintercept = c(max(baseline$baseline_end)), linetype = "dotted", color = "blue") +
      scale_y_log10() +
      theme_classic() +
      theme(legend.position = "bottom")

    Figure 7.3: Levels of fluorescence detected in wells used as negative controls. Red dotted line is the threshold used to determine relative concentrations.

    7.4 Fit standard curve

    The goal of performing a Standard Curve Experiment is to determine the quantity (concentration) of the target in the unknown samples. We have included a serial dilution of standards with known DNA quantities. We can fit a standard curve of threshold cycle Ct for each quantity. Then we can use that equation to calculate quantities in each unknown sample based on their respective Ct values.

    We read in the information on the cycle number for which each well was detected to cross the threshold value. Now, let’s plot the log quantity for each standard compared to the Ct value and add a linear regression. The PCR cycle at which the fluorescence level meets the threshold (threshold cycle Ct) should be > 8 and < 35. Ct < 8 indicates that there is too much template in the reaction, > 35 will likely lead to a higher standard deviation.

    Ct_standard <- Ct %>%
      filter(task == "STANDARD") 
    
    ggplot(Ct_standard, aes(x = log10(quantity), y = c)) +
      geom_hline(yintercept = c(8, 35), linetype = "dotted", color = "red") +
      geom_smooth(method = "lm", color = "black", alpha = .2) +
      geom_point(shape = 21, fill = "darkorange", size = 2) +
      facet_grid(. ~ target_name) +
      labs(x = "log(quantity)", y = "Cycle threshold (Ct)") +
      theme_classic()

    Figure 7.4: Standard curve based on two technical replicates for each standard. The red dotted lines indicate the desired range for Ct.

    Let’s take a look at the accuracy of our measurements.

    kable(
      Ct %>%
        filter(task == "STANDARD") %>%
        group_by(target_name, quantity) %>%
        summarize(mean_Ct = mean(c),
                  sd_Ct = sd(c)) %>%
        arrange(desc(quantity))
    )
    target_name quantity mean_Ct sd_Ct
    BRK2 7e-02 28.23913 0.1248073
    BRK2 7e-03 31.97369 0.1853827
    BRK2 7e-04 35.41940 0.8999762
    Table 7.1: Mean and standard deviation of Ct for each standard.
    Consider this

    How can we use our standards to determine the relative concentrations for our eDNA samples?

    That’s right we need to fit a linear regression to determine the slope and R2 value.

    df <- Ct_standard %>%
      filter(target_name == "BRK2")
    
    # fit linear regression
    score_model <- lm(c ~ log10(quantity), data = df)
    
    # compare results
    summary(score_model)
    
    Call:
    lm(formula = c ~ log10(quantity), data = df)
    
    Residuals:
           1        2        3        4        5        6 
    -0.68452  0.58824  0.22737 -0.03480  0.04011 -0.13639 
    
    Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
    (Intercept)      24.1410     0.5428   44.48 1.53e-06 ***
    log10(quantity)  -3.5901     0.2355  -15.24 0.000108 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 0.4711 on 4 degrees of freedom
    Multiple R-squared:  0.9831,    Adjusted R-squared:  0.9788 
    F-statistic: 232.3 on 1 and 4 DF,  p-value: 0.0001081

    The R2 value (correlation coefficient) measures the goodness of fit of the regression line and the individual Ct data points, generally we are looking for R2 values > 0.95-.99.

    7.5 Calculate relative concentrations

    For our experiment we want to know the relative concentrations of brook trout DNA in our samples. We can do this put plugging the Ct values for each sample into the equation for our standard curve we established using genomic DNA extracted from brook trout fin clips.

    \[ Concentration(ng/\mu l) = 10^{[(C_{t}-yintercept)/slope]} \]

    Ct <- Ct %>%
      mutate(rel_conc = 10^((c-score_model$coefficients[["(Intercept)"]])/score_model$coefficients[["log10(quantity)"]]))

    Let’s take a look at the calculated values for each unknown sample:

    kable(
      Ct %>%
        filter(task == "UNKNOWN") %>%
        mutate(rel_conc = rel_conc*1000) %>%
        select(-task, -quantity) %>%
        arrange(sample_name),
      digits = 2
    )
    well c sample_name target_name rel_conc
    C3 34.52 1 BRK2 1.29
    D3 34.12 1 BRK2 1.66
    E3 34.61 1 BRK2 1.21
    F3 34.80 2 BRK2 1.08
    G3 32.96 2 BRK2 3.49
    H3 33.65 2 BRK2 2.24
    E5 33.27 3 BRK2 2.87
    F5 32.60 3 BRK2 4.41
    G5 32.98 3 BRK2 3.46
    A6 34.17 4 BRK2 1.61
    B6 34.55 4 BRK2 1.26
    H5 32.84 4 BRK2 3.78
    A1 33.59 5 BRK2 2.33
    B1 34.84 5 BRK2 1.05
    C1 33.04 5 BRK2 3.32
    A8 32.47 6 BRK2 4.78
    B8 32.88 6 BRK2 3.68
    C8 32.70 6 BRK2 4.12
    D8 33.56 6 BRK2 2.39
    E8 33.48 6 BRK2 2.50
    F8 33.05 6 BRK2 3.29
    G8 32.56 6 BRK2 4.53
    H7 34.24 6 BRK2 1.54
    A9 32.82 7 BRK2 3.83
    H8 33.10 7 BRK2 3.19
    D1 32.61 8 BRK2 4.36
    E1 32.51 8 BRK2 4.67
    F1 32.73 8 BRK2 4.06
    A10 32.10 9 BRK2 6.06
    B10 32.81 9 BRK2 3.84
    C10 32.78 9 BRK2 3.93
    D10 33.01 10 BRK2 3.39
    E10 32.41 10 BRK2 4.96
    F10 33.39 10 BRK2 2.65
    A11 32.65 11 BRK2 4.26
    G10 33.21 11 BRK2 2.98
    H10 33.41 11 BRK2 2.61
    A4 32.42 12 BRK2 4.93
    B4 32.57 12 BRK2 4.50
    C4 32.24 12 BRK2 5.54
    D4 32.16 13 BRK2 5.83
    E4 32.57 13 BRK2 4.49
    F4 32.66 13 BRK2 4.23
    B11 32.58 14 BRK2 4.47
    C11 32.37 14 BRK2 5.09
    D11 32.39 14 BRK2 5.05
    B9 30.76 15 BRK2 14.34
    C9 31.14 15 BRK2 11.25
    D9 32.20 16 BRK2 5.67
    E9 31.90 16 BRK2 6.89
    F9 31.99 17 BRK2 6.50
    G9 32.00 17 BRK2 6.47
    H9 33.37 18 BRK2 2.69
    H11 32.27 18 BRK2 5.45
    C6 34.44 19 BRK2 1.36
    D6 33.77 19 BRK2 2.08
    E6 34.31 19 BRK2 1.47
    F6 34.70 20 BRK2 1.15
    G6 33.26 20 BRK2 2.89
    H6 33.27 20 BRK2 2.86
    A2 34.20 21 BRK2 1.58
    G1 33.05 21 BRK2 3.29
    H1 36.84 21 BRK2 0.29
    B2 32.80 22 BRK2 3.87
    C2 32.73 22 BRK2 4.05
    D2 32.85 22 BRK2 3.75
    A5 34.75 23 BRK2 1.11
    G4 32.99 23 BRK2 3.42
    H4 33.00 23 BRK2 3.41
    A7 33.06 24 BRK2 3.27
    B7 33.34 24 BRK2 2.74
    C7 33.20 24 BRK2 3.00
    B5 34.20 25 BRK2 1.58
    C5 35.35 25 BRK2 0.75
    D5 34.51 25 BRK2 1.29
    E11 31.70 26 BRK2 7.83
    F11 31.90 26 BRK2 6.90
    G11 31.14 26 BRK2 11.21
    G2 36.97 27 BRK2 0.27
    D7 33.55 28 BRK2 2.40
    E7 33.41 28 BRK2 2.63
    F7 36.07 28 BRK2 0.48
    A3 34.47 29 BRK2 1.32
    B3 34.90 29 BRK2 1.01
    H2 34.49 29 BRK2 1.31
    Table 7.2: Relative concentrations of brook trout DNA (pg/ul) in eDNA samples based on standard curve using brook trout DNA.

    Now that we have a processed data set that we can use for analysis, let’s make sure to write it out to file so that we can access it more easily down the line.

    setup %>% 
      left_join(Ct) %>% 
      filter(task == "UNKNOWN") %>%
      rename(tube_no = sample_name) %>%
      select(tube_no, rel_conc) %>%
      replace(is.na(.), 0) %>% 
      write_delim("data/RAND_eDNA-concentrations.txt", delim = "\t")